A Logic for Document Spanners
نویسندگان
چکیده
منابع مشابه
A Logic for Document Spanners
Document spanners are a formal framework for information extraction that was introduced by Fagin, Kimelfeld, Reiss, and Vansummeren (PODS 2013, JACM 2015). One of the central models in this framework are core spanners, which are based on regular expressions with variables that are then extended with an algebra. As shown by Freydenberger and Holldack (ICDT 2016), there is a connection between co...
متن کاملRecursive Programs for Document Spanners
A document spanner models a program for Information Extraction (IE) as a function that takes as input a text document (string over a finite alphabet) and produces a relation of spans (intervals in the document) over a predefined schema. A well studied language for expressing spanners is that of the regular spanners: relational algebra over regex formulas, which are obtained by adding capture va...
متن کاملConstant delay algorithms for regular document spanners
Regular expressions and automata models with capture variables are core tools in rule-based information extraction. These formalisms, also called regular document spanners, use regular languages in order to locate the data that a user wants to extract from a text document, and then store this data into variables. Since document spanners can easily generate large outputs, it is important to have...
متن کاملA Document Spanners: A Formal Approach to Information Extraction
An intrinsic part of information extraction is the creation and manipulation of relations extracted from text. In this article, we develop a foundational framework where the central construct is what we call a document spanner (or just spanner for short). A spanner maps an input string into a relation over the spans (intervals specified by bounding indices) of the string. The focus of this arti...
متن کاملDocument Spanners for Extracting Incomplete Information: Expressiveness and Complexity
Rule-based information extraction has lately received a fair amount of attention from the database community, with several languages appearing in the last few years. Although information extraction systems are intended to deal with semistructured data, all language proposals introduced so far are designed to output relations, thus making them incapable of handling incomplete information. To rem...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Theory of Computing Systems
سال: 2018
ISSN: 1432-4350,1433-0490
DOI: 10.1007/s00224-018-9874-1